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Claims 

1 1 . A method for assembly of a plurality of reads from a genomic region, the method 

2 comprising the steps of: 

3 providing a plurality of reads from a genomic region; 

4 indexing a plurality of read subsequences for each of the plurality of reads, each 

5 subsequence having an associated read with which it corresponds; 

6 extracting from the indexed subsequences a plurality of read pairs that have a 

7 predetermined number of subsequences in common; and 

8 merging the read pairs along a continuum. 

1 2. The method of claim 1, comprising the step of providing a plurality of reads generated 

2 from sequencing both ends of a plurality of DNA segments, each read having associated linking 
?*<3 information comprising an associated orientation relative to a read from an opposite end of the 
2;U DNA segment, and an associated distance from the read on the opposite end of the DNA 

i;;i|5 segment. 

: Kl 3, The method of claim 1, comprising the step of providing a plurality reads that are reverse 

p*2 complements of a plurality of reads provided by sequencing both ends of a plurality of DNA 

iS 3 segments. 

^Jl 4. The method of claim 1, comprising the step of sorting the indexed read subsequences. 

Ml 5. The method of claim 1, comprising the step of discarding read subsequences having more 

JS2 than a cutoff number of occurrences from the plurality of indexed subsequences. 

N l 6. The method of claim 1, comprising the step of indexing a plurality of read subsequences 

2 of a predetermined length for each of the plurality of reads. 

1 7. The method of claim 6, wherein the predetermined length for each of the plurality of 

2 reads is between about 12 and about 32 bases long. 

1 8. The method of claim 1 , comprising the step of indexing a plurality of subsequences for 

2 each read, the index comprising for each subsequence an associated read and an associated 

3 position on the read with which it corresponds. 

1 9. The method of claim 1, comprising the step of performing alignments on the plurality of 

2 read pairs having a predetermined number of subsequences in common. 
1 10. The method of claim 8, comprising the steps of: 
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2 performing alignments on the plurality of read pairs having a predetermined number of 

3 subsequences in common; and 

4 using the associated position on the reads with which the subsequences correspond to 

5 verify overlap. 

1 11. The method of claim 2, comprising the step of using the linking information associated 

2 with the reads to confirm that the merged pairs are merged correctly. 

1 12. The method of claim 2, comprising the step of using the associated linking information to 

2 an ambiguity in the merged reads. 

1 13. The method of claim 12, comprising the step of identifying a repeat region and a set of 

2 unique regions. 

1 14. The method of claim 13, comprising the step of linking pairs of unique regions using the 

2 linking information associated with the reads in the unique regions. 

1 15. The method of claim 14, comprising the step of inserting the repeat region between each 

2 linked pair of unique regions with which the repeat region corresponds. 

1 16. The method of claim 13, comprising the step of merging linked pairs of unique regions 

2 using the linking information associated with the reads in the unique regions. 

1 17. A method for assembly of merged reads from a genomic region, the method comprising 

2 the steps of: 

3 providing one or more sets of merged reads from a genomic region comprising a set of 

4 reads having associated linking information; 

5 using the associated linking information to identify an ambiguity in the merged reads; 

6 identifying a repeat region and a set of unique regions; and 

7 linking pairs of unique regions using the linking information associated with the reads in 

8 the unique regions. 

1 1 8. The method of claim 17, comprising the step of inserting the repeat region between each 

2 linked pair of unique regions with which the repeat region corresponds. 

1 19. The method of claim 17, comprising the step of merging linked pairs of unique regions 

2 using the linking information associated with the reads in the unique regions. 

1 20. An article of manufacture having computer-readable program means embodied thereon 

2 for assembly of a plurality of reads from a genomic region, the article comprising: 

3 computer-readable program means for providing a plurality of reads from a genomic 
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4 region; 

5 computer-readable program means for indexing a plurality of read subsequences for each 

6 of the plurality of reads, each subsequence having an associated read with which it corresponds; 

7 computer-readable program means for extracting from the indexed subsequences a 

8 plurality of read pairs that have a predetermined number of subsequences in common; and 

9 computer-readable program means for merging the read pairs along a continuum. 

1 2 1 . An article of manufacture having computer-readable program means embodied 

2 thereon for assembly of merged reads from a genomic region, the article comprising: 

3 computer-readable program means for providing one or more sets of merged reads from a 

4 genomic region comprising a set of reads having associated linking information; 

5 computer-readable program means for using the associated linking information to 
.6 identify one or more ambiguities in the merged reads; 

17 computer-readable program means for identifying a repeat region and a set of unique 

1 8 regions; and 

1 9 computer-readable program means for linking pairs of unique regions using the linking 

10 information associated with the reads in the unique regions. 
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